FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances

نویسندگان

  • Catharine Wyss
  • Chris Giannella
  • Edward Robertson
چکیده

Discovering functional dependencies (FDs) from an existing relation instance is an important technique in data mining and database design. To date, even the most eecient solutions are exponential in the number of attributes of the relation (n), even when the size of the output is not exponential in n. Lopes et al. developed an algorithm, Dep-Miner, that works well for large n on randomly-generated integer-valued relation instances LPL 00a]. Dep-Miner rst reduces the FD discovery problem to that of nding minimal covers for hypergraphs, then employs a level-wise search strategy to determine these minimal covers. Our algorithm, FastFDs, instead employs a depth-rst, heuristic driven search strategy for generating minimal covers of hypergraphs. This type of search is commonly used to solve search problems in Artiicial Intelligence (AI) RN 95]. Our experimental results indicate that the levelwise strategy that is the hallmark of many successful data mining algorithms is in fact signiicantly surpassed by the depth-rst, heuristic driven strategy FastFDs employs, due to the inherent space eeciency of the search. Furthermore , we revisit the comparison between Dep-Miner and Tane, including FastFDs. We report several tests on distinct benchmark relation instances, comparing the Dep-Miner and FastFDs hypergraph approaches to Tane's partitioning approach for mining FDs from a relation instance. At the end of the paper (appendix A) we provide experimental data comparing FastFDs with a third algorithm, fdep FS 99].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract

Discovering functional dependencies (FDs) from an existing relation instance is an important technique in data mining and database design. To date, even the most e cient solutions are exponential in the number of attributes of the relation (n), even when the size of the output is not exponential in n. Lopes et al. developed an algorithm, Dep-Miner, that works well for large n on randomly-genera...

متن کامل

Ordering Depth First Search to Improve AFD Mining

This paper describes a new search algorithm, bottom-up attribute keyness depth-first search (BU-AKD), for mining powerset lattices with the use of a monotonic approximation measure; characteristics present in many problem domains. The research reported here focuses on one of these problem domains, the discovery of Approximate Functional Dependencies (AFDs). AFDs are measured versions of functio...

متن کامل

Resampling in an Indeenite Database to Approximate Functional Dependencies Research Note Rn/98/10

We reintroduce Numerical Dependencies (NDs), deened originally to enhance database design, within a data mining context where we use ND sets to approximate the satisfaction of a given Functional Dependency (FD) set within a relation. We motivate NDs by examining the use of indeenite information in relations. Indeenite information is represented within the relational model by allowing cells to c...

متن کامل

Heuristic and exact algorithms for Generalized Bin Covering Problem

In this paper, we study the Generalized Bin Covering problem. For this problem an exact algorithm is introduced which can nd optimal solution for small scale instances. To nd a solution near optimal for large scale instances, a heuristic algorithm has been proposed. By computational experiments, the eciency of the heuristic algorithm is assessed.

متن کامل

Two Strategies Based on Meta-Heuristic Algorithms for Parallel Row Ordering Problem (PROP)

Proper arrangement of facility layout is a key issue in management that influences efficiency and the profitability of the manufacturing systems. Parallel Row Ordering Problem (PROP) is a special case of facility layout problem and consists of looking for the best location of n facilities while similar facilities (facilities which has some characteristics in common) should be arranged in a row ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001